A Hybrid Random Forests-boruta Feature Selection Algorithm for Biodegradibility Prediction

نویسندگان

  • Zhe F. Liu
  • Hedia Fgaier
  • Stanislav Y. Ivanov
  • Ali Elkamel
  • Xiang H. Meng
  • Suo Q. Zhao
چکیده

The a priori knowledge about biodegradability is adopted to save time and money for research and design of new products. Quantitative structure activity relationship (QSAR) models as a tool for biodegradability prediction of chemicals have been encouraged by environmental organizations. In the current work, a new algorithm has been proposed to investigate the importance of chemical descriptors to be used as input variables in modeling and prediction of biodegradability. The algorithm allows obtaining an ensemble of feature subsets compromising between model complexity and generalization performance. It utilizes random forests as classifier coupled with Boruta algorithm to automatically rank and omit descriptors based on Z-score. It is shown how four least relevant variables were identified and removed from model remaining generation ability. Furthermore, a hybrid feature selection method is developed to inspect weak relevant features and omit them in a loop mode in order to remain generalization of classifiers. The prediction accuracy of the new model showed improvements compared to previous

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

bootfs - Bootstrapped feature selection

The usage of the package is illustrated for three classification algorithms: pamr (Prediction analysis for Microarrays, [3], implementation in pamr -Rpackage), rf boruta (Random forests with the Boruta algorithm for feature selection, [2], implementation in Boruta-R-package) and scad (Support Vector Machines with Smoothly Clipped Absolute Deviation feature selection, [4], implementation in the ...

متن کامل

Evaluation of variable selection methods for random forests and omics data sets.

Machine learning methods and in particular random forests are promising approaches for prediction based on high dimensional omics data sets. They provide variable importance measures to rank predictors according to their predictive power. If building a prediction model is the main goal of a study, often a minimal set of variables with good prediction performance is selected. However, if the obj...

متن کامل

Feature Selection with the Boruta Package

This article describes a R package Boruta, implementing a novel feature selection algorithm for finding all relevant variables. The algorithm is designed as a wrapper around a Random Forest classification algorithm. It iteratively removes the features which are proved by a statistical test to be less relevant than random probes. The Boruta package provides a convenient interface to the algorith...

متن کامل

Feature Selection and Predictive Modeling of Housing Data Using Random Forest

Predictive data analysis and modeling involving machine learning techniques become challenging in presence of too many explanatory variables or features. Presence of too many features in machine learning is known to not only cause algorithms to slow down, but they can also lead to decrease in model prediction accuracy. This study involves housing dataset with 79 quantitative and qualitative fea...

متن کامل

Intelligent application for Heart disease detection using Hybrid Optimization algorithm

Prediction of heart disease is very important because it is one of the causes of death around the world. Moreover, heart disease prediction in the early stage plays a main role in the treatment and recovery disease and reduces costs of diagnosis disease and side effects it. Machine learning algorithms are able to identify an effective pattern for diagnosis and treatment of the disease and ident...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015